103 research outputs found

    Statistical approaches for natural language modelling and monotone statistical machine translation

    Full text link
    Esta tesis reune algunas contribuciones al reconocimiento de formas estadístico y, más especícamente, a varias tareas del procesamiento del lenguaje natural. Varias técnicas estadísticas bien conocidas se revisan en esta tesis, a saber: estimación paramétrica, diseño de la función de pérdida y modelado estadístico. Estas técnicas se aplican a varias tareas del procesamiento del lenguajes natural tales como clasicación de documentos, modelado del lenguaje natural y traducción automática estadística. En relación con la estimación paramétrica, abordamos el problema del suavizado proponiendo una nueva técnica de estimación por máxima verosimilitud con dominio restringido (CDMLEa ). La técnica CDMLE evita la necesidad de la etapa de suavizado que propicia la pérdida de las propiedades del estimador máximo verosímil. Esta técnica se aplica a clasicación de documentos mediante el clasificador Naive Bayes. Más tarde, la técnica CDMLE se extiende a la estimación por máxima verosimilitud por leaving-one-out aplicandola al suavizado de modelos de lenguaje. Los resultados obtenidos en varias tareas de modelado del lenguaje natural, muestran una mejora en términos de perplejidad. En a la función de pérdida, se estudia cuidadosamente el diseño de funciones de pérdida diferentes a la 0-1. El estudio se centra en aquellas funciones de pérdida que reteniendo una complejidad de decodificación similar a la función 0-1, proporcionan una mayor flexibilidad. Analizamos y presentamos varias funciones de pérdida en varias tareas de traducción automática y con varios modelos de traducción. También, analizamos algunas reglas de traducción que destacan por causas prácticas tales como la regla de traducción directa; y, así mismo, profundizamos en la comprensión de los modelos log-lineares, que son de hecho, casos particulares de funciones de pérdida. Finalmente, se proponen varios modelos de traducción monótonos basados en técnicas de modelado estadístico .Andrés Ferrer, J. (2010). Statistical approaches for natural language modelling and monotone statistical machine translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/7109Palanci

    Constrained domain maximum likelihood estimation and the loss function in statistical pattern recognition

    Full text link
    In this thesis we present a new estimation algorithm for statistical models which does not incurs in the over-trainning problems. This new estimation techinque, the so-called, constrained domain maximum likelihood estimation (CDMLE) holds all the theoretical properties of the maximum likelihood estimation and furthermore it does not provides overtrained parameter sets. On the other hand, the impliations of the the 0-1 loss function assumption are analysed in the pattern recognition tasks. Specifically, more versatile functions are designed without increasing the optimal classification rule costs. This approach is applied to the statistical machine translation problem.Andrés Ferrer, J. (2008). Constrained domain maximum likelihood estimation and the loss function in statistical pattern recognition. http://hdl.handle.net/10251/13638Archivo delegad

    Discriminative Bernoulli HMMs for isolated handwritten word recognition

    Full text link
    [EN] Bernoulli HMMs (BHMMs) have been successfully applied to handwritten text recognition (HTR) tasks such as continuous and isolated handwritten words. BHMMs belong to the generative model family and, hence, are usually trained by (joint) maximum likelihood estimation (MLE) by means of the Baum-Welch algorithm. Despite the good properties of the MLE criterion, there are better training criteria such as maximum mutual information (MM!). The MMI is the most widespread criterion to train discriminative models such as log-linear (or maximum entropy) models. Inspired by a BHMM classifier, in this work, a log-linear HMM (LLHMM) for binary data is proposed. The proposed model is proved to be equivalent to the BHMM classifier, and, in this way, a discriminative training framework for BHMM classifiers is defined. The behavior of the proposed discriminative training framework is deeply studied in a well known task of isolated word recognition, the RIMES database. (C) 2013 Elsevier B.V. All rights reserved.Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV ‘‘Consolider Ingenio 2010’’ program (CSD2007-00018), iTrans2 (TIN2009-14511) and MITTRAL (TIN2009-14633-C03-01) projects. Also supported by the IST Programme of the European Community, under the PASCAL2 Network of Excellence, IST-2007-216886, and by the Spanish MITyC under the erudito.com (TSI-020110-2009-439).Giménez Pastor, A.; Andrés Ferrer, J.; Juan, A. (2014). Discriminative Bernoulli HMMs for isolated handwritten word recognition. Pattern Recognition Letters. 35:157-168. https://doi.org/10.1016/j.patrec.2013.05.016S1571683

    Window repositioning for Printed Arabic Recognition

    Full text link
    [EN] Bernoulli HMMs are conventional HMMs in which the emission probabilities are modeled with Bernoulli mixtures. They have recently been applied, with good results, in off-line text recognition in many languages, in particular, Arabic. A key idea that has proven to be very effective in this application of Bernoulli HMMs is the use of a sliding window of adequate width for feature extraction. This idea has allowed us to obtain very competitive results in the recognition of both Arabic handwriting and printed text. Indeed, a system based on it ranked first at the ICDAR 2011 Arabic recognition competition on the Arabic Printed Text Image (APTI) database. More recently, this idea has been refined by using repositioning techniques for extracted windows, leading to further improvements in Arabic handwriting recognition. In the case of printed text, this refinement led to an improved system which ranked second at the ICDAR 2013 second competition on APTI, only at a marginal distance from the best system. In this work, we describe the development of this improved system. Following evaluation protocols similar to those of the competitions on APTI, exhaustive experiments are detailed from which state-of-the-art results are obtained.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/ICT-287755) under grant agreement no. 287755. The research is also supported by the Spanish Government (Plan E, iTrans2 TIN2009-14511 and AECID 2011/2012 grant).Alkhoury, I.; Giménez Pastor, A.; Juan, A.; Andrés Ferrer, J. (2015). Window repositioning for Printed Arabic Recognition. Pattern Recognition Letters. 51:86-93. https://doi.org/10.1016/j.patrec.2014.08.009S86935

    Amyotrophic lateral sclerosis, gene deregulation in the anterior horn of the spinal cord and frontal cortex area 8: implications in frontotemporal lobar degeneration

    Get PDF
    Transcriptome arrays identifies 747 genes differentially expressed in the anterior horn of the spinal cord and 2,300 genes differentially expressed in frontal cortex area 8 in a single group of typical sALS cases without frontotemporal dementia compared with age-matched controls. Main up-regulated clusters in the anterior horn are related to inflammation and apoptosis; down-regulated clusters are linked to axoneme structures and protein synthesis. In contrast, up-regulated gene clusters in frontal cortex area 8 involve neurotransmission, synaptic proteins and vesicle trafficking, whereas main down-regulated genes cluster into oligodendrocyte function and myelin-related proteins. RT-qPCR validates the expression of 58 of 66 assessed genes from different clusters. The present results: a. reveal regional differences in de-regulated gene expression between the anterior horn of the spinal cord and frontal cortex area 8 in the same individuals suffering from sALS; b. validate and extend our knowledge about the complexity of the inflammatory response in the anterior horn of the spinal cord; and c. identify for the first time extensive gene up-regulation of neurotransmission and synaptic-related genes, together with significant down-regulation of oligodendrocyte-and myelin-related genes, as important contributors to the pathogenesis of frontal cortex alterations in the sALS/frontotemporal lobar degeneration spectrum complex at stages with no apparent cognitive impairment

    Arabic Printed Word Recognition Using Windowed Bernoulli HMMs

    Full text link
    [EN] Hidden Markov Models (HMMs) are now widely used for off-line text recognition in many languages and, in particular, Arabic. In previous work, we proposed to directly use columns of raw, binary image pixels, which are directly fed into embedded Bernoulli (mixture) HMMs, that is, embedded HMMs in which the emission probabilities are modeled with Bernoulli mixtures. The idea was to by-pass feature extraction and to ensure that no discriminative information is filtered out during feature extraction, which in some sense is integrated into the recognition model. More recently, we extended the column bit vectors by means of a sliding window of adequate width to better capture image context at each horizontal position of the word image. However, these models might have limited capability to properly model vertical image distortions. In this paper, we have considered three methods of window repositioning after window extraction to overcome this limitation. Each sliding window is translated (repositioned) to align its center to the center of mass. Using this approach, state-of-art results are reported on the Arabic Printed Text Recognition (APTI) database.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish Government (Plan E, iTrans2 TIN2009-14511 and AECID 2011/2012 grant).Alkhoury, I.; Giménez Pastor, A.; Juan Císcar, A.; Andrés Ferrer, J. (2013). Arabic Printed Word Recognition Using Windowed Bernoulli HMMs. Lecture Notes in Computer Science. 8156:330-339. https://doi.org/10.1007/978-3-642-41181-6_34S3303398156Dehghan, M., et al.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognition 34(5), 1057–1065 (2001), http://www.sciencedirect.com/science/article/pii/S0031320300000510Giménez, A., Juan, A.: Embedded Bernoulli Mixture HMMs for Handwritten Word Recognition. In: ICDAR 2009, Barcelona, Spain, pp. 896–900 (July 2009)Giménez, A., Khoury, I., Juan, A.: Windowed Bernoulli Mixture HMMs for Arabic Handwritten Word Recognition. In: ICFHR 2010, Kolkata, India, pp. 533–538 (November 2010)Grosicki, E., El Abed, H.: ICDAR 2009 Handwriting Recognition Competition. In: ICDAR 2009, Barcelona, Spain, pp. 1398–1402 (July 2009)Günter, S., et al.: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognition 37, 2069–2079 (2004)Märgner, V., El Abed, H.: ICDAR 2007 - Arabic Handwriting Recognition Competition. In: ICDAR 2007, Curitiba, Brazil, pp. 1274–1278 (September 2007)Märgner, V., El Abed, H.: ICDAR 2009 Arabic Handwriting Recognition Competition. In: ICDAR 2009, Barcelona, Spain, pp. 1383–1387 (July 2009)Pechwitz, M., et al.: IFN/ENIT - database of handwritten Arabic words. In: CIFED 2002, Hammamet, Tunis, pp. 21–23 (October 2002)Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice-Hall (1993)Slimane, F., et al.: A new arabic printed text image database and evaluation protocols. In: ICDAR 2009, pp. 946–950 (July 2009)Slimane, F., et al.: ICDAR 2011 - arabic recognition competition: Multi-font multi-size digitally represented text. In: ICDAR 2011 - Arabic Recognition Competition, pp. 1449–1453. IEEE (September 2011)Young, S.: et al.: The HTK Book. Cambridge University Engineering Department (1995

    Inflammatory Gene Expression In Whole Peripheral Blood At Early Stage Of Sporadic Amyotrophic Lateral Sclerosis

    Get PDF
    Objective: Characterization of altered expression of selected transcripts linked to inflammation in the peripheral blood of sporadic amyotrophic lateral sclerosis (sALS) patients at early stage of disease to increase knowledge about peripheral inflammatory response in sALS. Methods: RNA expression levels of 45 genes were assessed by RT-qPCR in 22 sALS cases in parallel with 13 age-matched controls. Clinical and serum parameters were assessed at the same time. Results: Upregulation of genes coding for factors involved in leukocyte extravasation (ITGB2, INPP5D, SELL, and ICAM1) and extracellular matrix remodeling (MMP9 and TIMP2), as well as downregulation of certain chemokines (CCL5 and CXC5R), antiinflammatory cytokines (IL10, TGFB2, and IL10RA), pro-inflammatory cytokines (IL-6), and T-cell regulators (CD2 and TRBC1) was found in sALS cases independently of gender, clinical symptoms at onset (spinal, respiratory, or bulbar), progression, peripheral leukocyte number, and integrity of RNA. MMP9 levels positively correlated with age, whereas CCR5, CCL5, and TRBC1 negatively correlated with age in sALS but not in controls. Relatively higher TNFA expression levels correlate with higher creatinine kinase protein levels in plasma. Conclusion: Present findings show early inflammatory responses characterized by upregulation of factors enabling extravasation of leukocytes and extracellular matrix remodeling in blood in sALS cases, in addition to increased TNFA levels paralleling skeletal muscle damage

    Speaker-adapted confidence measures for speech recognition of video lectures

    Full text link
    [EN] Automatic speech recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent native Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the NB behaviour. Additionally, as a main contribution, we propose to adapt the CM to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the NB model is clearly superseded by the proposed LR classifier.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755. Also supported by the Spanish MINECO (iTrans2 TIN2009-14511 and Active2Trans TIN2012-31723) research projects and the FPI Scholarship BES-2010-033005.Sanchez-Cortina, I.; Andrés Ferrer, J.; Sanchis Navarro, JA.; Juan Císcar, A. (2016). Speaker-adapted confidence measures for speech recognition of video lectures. Computer Speech and Language. 37:11-23. https://doi.org/10.1016/j.csl.2015.10.003S11233
    • …
    corecore